Quer ies over Document Collections - a Case Study ( incomplete workshop discussion draft )

نویسندگان

Alexander Löser

Steffen Lutter

Patrick Düssel

Volker Markl

چکیده

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000’s of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. “Google Squared” or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel joinoperations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ad-Hoc Queries over Document Collections - A Case Study

متن کامل

Inferred AP : Estimating Average Precision with Incomplete Judgments

In this work, we consider the evaluation of retrieval systems using incomplete relevance information. When the document collection is dynamic, as in the case of web retrieval, new documents are added to the collection over time. Hence, the relevance judgments become incomplete, and the judged relevant documents become a smaller random subset of the entire relevant document set. Also, in the cas...

متن کامل

تدوین پیشنویس قانون سلامت روان

Objectives Mental health acts have been developed in different countries to protect human and civil rights of people with psychiatric disorders. In Iran, although there are some scattered laws within the existing body of laws, there is no separate mental health act. The aim of the present project was to prepare a draft pertaining to the mental health act in the country. Methods The draft of th...

متن کامل

Language Access to Distributed Data with Error Recovery

This paper discusses an e f f o r t in the a p p l i c a t i o n of a r t i f i c i a l i n t e l l i g e n c e to the access of data from a l a rge , d i s t r i b u t e d data base over a computer network. A running system is described that provides rea l t ime access over the ARPANET to a data base d i s t r i b u t e d over several machines. The system accepts a ra ther wide range of na tu ...

متن کامل

Who Regional Office for Europe

The Meeting, convened to finalize the text of the document, was attended by representatives of professional associations and societies concerned with general practice/family medicine, representatives of the medical and nursing professions as a whole, and experts who had contributed to the preparation of the draft Charter. The Meeting discussed the concepts and intentions of the document, the re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Quer ies over Document Collections - a Case Study ( incomplete workshop discussion draft )

نویسندگان

چکیده

منابع مشابه

Ad-Hoc Queries over Document Collections - A Case Study

Inferred AP : Estimating Average Precision with Incomplete Judgments

تدوین پیشنویس قانون سلامت روان

Language Access to Distributed Data with Error Recovery

Who Regional Office for Europe

عنوان ژورنال:

اشتراک گذاری